The goal of this markdown is to describe the differential expression of MPNST samples compared to other NF1 tumors. We are using the patient tumor data only from a diverse number of datasets described in this Synapse Table.
The data is stored in 4 different synapse tables so we need to query and pull down those datasets and join them together. This comprises all the human-related RNA-seq data in the NF data portal.
## ── Attaching packages ───────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.3
## ✓ tidyr 1.0.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Parsed with column specification:
## cols(
## tableId = col_character()
## )
## Parsed with column specification:
## cols(
## ROW_ID = col_double(),
## ROW_VERSION = col_double(),
## specimenID = col_character(),
## individualID = col_character(),
## Symbol = col_character(),
## totalCounts = col_double(),
## zScore = col_double(),
## tumorType = col_character(),
## nf1Genotype = col_character(),
## sex = col_character(),
## isCellLine = col_logical()
## )
## Parsed with column specification:
## cols(
## ROW_ID = col_double(),
## ROW_VERSION = col_double(),
## specimenID = col_character(),
## individualID = col_character(),
## Symbol = col_character(),
## totalCounts = col_double(),
## zScore = col_double(),
## tumorType = col_character(),
## nf1Genotype = col_character(),
## sex = col_character(),
## isCellLine = col_logical()
## )
## Parsed with column specification:
## cols(
## ROW_ID = col_double(),
## ROW_VERSION = col_double(),
## specimenID = col_character(),
## individualID = col_character(),
## Symbol = col_character(),
## totalCounts = col_double(),
## zScore = col_double(),
## tumorType = col_character(),
## nf1Genotype = col_character(),
## sex = col_character(),
## isCellLine = col_logical()
## )
## Parsed with column specification:
## cols(
## ROW_ID = col_double(),
## ROW_VERSION = col_double(),
## specimenID = col_character(),
## individualID = col_character(),
## Symbol = col_character(),
## totalCounts = col_double(),
## zScore = col_double(),
## tumorType = col_character(),
## nf1Genotype = col_character(),
## sex = col_character(),
## isCellLine = col_logical()
## )
Now that we have the data we can use the counts to calculate differential expression between MPNST and other tumors.
We use DESeq2 to calculate differential expression between the 20 MPNST samples and the other NF1-related tumors.
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:dplyr':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
## colnames, colSums, dirname, do.call, duplicated, eval, evalq,
## Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
## lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
## pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
## rowSums, sapply, setdiff, sort, table, tapply, union, unique,
## unsplit, which, which.max, which.min
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:tidyr':
##
## expand
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
##
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## The following object is masked from 'package:purrr':
##
## reduce
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
## Loading required package: SummarizedExperiment
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: DelayedArray
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
##
## anyMissing, rowMedians
## The following object is masked from 'package:dplyr':
##
## count
## Loading required package: BiocParallel
##
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
##
## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following object is masked from 'package:purrr':
##
## simplify
## The following objects are masked from 'package:base':
##
## aperm, apply
## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## 36 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
## -- replacing outliers and refitting for 12087 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
## 16 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
## [1] "tumorTypeCutaneous.Neurofibroma"
## [2] "tumorTypeEpendymoma"
## [3] "tumorTypeGanglioglioma"
## [4] "tumorTypeHigh.Grade.Glioma"
## [5] "tumorTypeLow.Grade.Glioma"
## [6] "tumorTypeMalignant.Peripheral.Nerve.Sheath.Tumor"
## [7] "tumorTypeNeurofibroma"
## [8] "tumorTypeOther"
## [9] "tumorTypePlexiform.Neurofibroma"
## [10] "sexFemale"
## [11] "sexmale"
## [12] "sexMale"
We filter with those genes that exhibit a log2 fold change of over 2 and an adjusted p.value < 0.01 to get 1626 genes.
Next step is to use functional enrichment tools to compare these genes to various gene sets (GO, KEGG, REACTOME) to see what is showing up in these genes that is of interest. This evaluates the genes in the table above.
##
## clusterProfiler v3.10.1 For help: https://guangchuangyu.github.io/software/clusterProfiler
##
## If you use clusterProfiler in published research, please cite:
## Guangchuang Yu, Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.
##
## Attaching package: 'clusterProfiler'
## The following object is masked from 'package:DelayedArray':
##
## simplify
## The following object is masked from 'package:purrr':
##
## simplify
Here we calculate the enrichment in KEGG terms. These are primarily pathways.
There is not much there! Maybe we are too stringent?
Now we can try GO enrichment. We focus on the ‘biological process’ ontology as we are more interested in those terms.
## Loading required package: org.Hs.eg.db
## Loading required package: AnnotationDbi
##
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:dplyr':
##
## select
##